101 research outputs found
Credit assignment in multiple goal embodied visuomotor behavior
The intrinsic complexity of the brain can lead one to set aside issues related to its relationships with the body, but the field of embodied cognition emphasizes that understanding brain function at the system level requires one to address the role of the brain-body interface. It has only recently been appreciated that this interface performs huge amounts of computation that does not have to be repeated by the brain, and thus affords the brain great simplifications in its representations. In effect the brain’s abstract states can refer to coded representations of the world created by the body. But even if the brain can communicate with the world through abstractions, the severe speed limitations in its neural circuitry mean that vast amounts of indexing must be performed during development so that appropriate behavioral responses can be rapidly accessed. One way this could happen would be if the brain used a decomposition whereby behavioral primitives could be quickly accessed and combined. This realization motivates our study of independent sensorimotor task solvers, which we call modules, in directing behavior. The issue we focus on herein is how an embodied agent can learn to calibrate such individual visuomotor modules while pursuing multiple goals. The biologically plausible standard for module programming is that of reinforcement given during exploration of the environment. However this formulation contains a substantial issue when sensorimotor modules are used in combination: The credit for their overall performance must be divided amongst them. We show that this problem can be solved and that diverse task combinations are beneficial in learning and not a complication, as usually assumed. Our simulations show that fast algorithms are available that allot credit correctly and are insensitive to measurement noise
Solving Bongard Problems with a Visual Language and Pragmatic Reasoning
More than 50 years ago Bongard introduced 100 visual concept learning
problems as a testbed for intelligent vision systems. These problems are now
known as Bongard problems. Although they are well known in the cognitive
science and AI communities only moderate progress has been made towards
building systems that can solve a substantial subset of them. In the system
presented here, visual features are extracted through image processing and then
translated into a symbolic visual vocabulary. We introduce a formal language
that allows representing complex visual concepts based on this vocabulary.
Using this language and Bayesian inference, complex visual concepts can be
induced from the examples that are provided in each Bongard problem. Contrary
to other concept learning problems the examples from which concepts are induced
are not random in Bongard problems, instead they are carefully chosen to
communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic
reasoning into account we find good agreement between the concepts with high
posterior probability and the solutions formulated by Bongard himself. While
this approach is far from solving all Bongard problems, it solves the biggest
fraction yet
Adversarially Tuned Scene Generation
Generalization performance of trained computer vision systems that use
computer graphics (CG) generated data is not yet effective due to the concept
of 'domain-shift' between virtual and real data. Although simulated data
augmented with a few real world samples has been shown to mitigate domain shift
and improve transferability of trained models, guiding or bootstrapping the
virtual data generation with the distributions learnt from target real world
domain is desired, especially in the fields where annotating even few real
images is laborious (such as semantic labeling, and intrinsic images etc.). In
order to address this problem in an unsupervised manner, our work combines
recent advances in CG (which aims to generate stochastic scene layouts coupled
with large collections of 3D object models) and generative adversarial training
(which aims train generative models by measuring discrepancy between generated
and real data in terms of their separability in the space of a deep
discriminatively-trained classifier). Our method uses iterative estimation of
the posterior density of prior distributions for a generative graphical model.
This is done within a rejection sampling framework. Initially, we assume
uniform distributions as priors on the parameters of a scene described by a
generative graphical model. As iterations proceed the prior distributions get
updated to distributions that are closer to the (unknown) distributions of
target data. We demonstrate the utility of adversarially tuned scene generation
on two real-world benchmark datasets (CityScapes and CamVid) for traffic scene
semantic labeling with a deep convolutional net (DeepLab). We realized
performance improvements by 2.28 and 3.14 points (using the IoU metric) between
the DeepLab models trained on simulated sets prepared from the scene generation
models before and after tuning to CityScapes and CamVid respectively.Comment: 9 pages, accepted at CVPR 201
Bayesian Classifier Fusion with an Explicit Model of Correlation
Combining the outputs of multiple classifiers or experts into a single
probabilistic classification is a fundamental task in machine learning with
broad applications from classifier fusion to expert opinion pooling. Here we
present a hierarchical Bayesian model of probabilistic classifier fusion based
on a new correlated Dirichlet distribution. This distribution explicitly models
positive correlations between marginally Dirichlet-distributed random vectors
thereby allowing explicit modeling of correlations between base classifiers or
experts. The proposed model naturally accommodates the classic Independent
Opinion Pool and other independent fusion algorithms as special cases. It is
evaluated by uncertainty reduction and correctness of fusion on synthetic and
real-world data sets. We show that a change in performance of the fused
classifier due to uncertainty reduction can be Bayes optimal even for highly
correlated base classifiers.Comment: 12 pages, 4 figures, 1 table, revised title and Fig 2, added real
data set Bookies
Multimodal Uncertainty Reduction for Intention Recognition in Human-Robot Interaction
Assistive robots can potentially improve the quality of life and personal
independence of elderly people by supporting everyday life activities. To
guarantee a safe and intuitive interaction between human and robot, human
intentions need to be recognized automatically. As humans communicate their
intentions multimodally, the use of multiple modalities for intention
recognition may not just increase the robustness against failure of individual
modalities but especially reduce the uncertainty about the intention to be
predicted. This is desirable as particularly in direct interaction between
robots and potentially vulnerable humans a minimal uncertainty about the
situation as well as knowledge about this actual uncertainty is necessary.
Thus, in contrast to existing methods, in this work a new approach for
multimodal intention recognition is introduced that focuses on uncertainty
reduction through classifier fusion. For the four considered modalities speech,
gestures, gaze directions and scene objects individual intention classifiers
are trained, all of which output a probability distribution over all possible
intentions. By combining these output distributions using the Bayesian method
Independent Opinion Pool the uncertainty about the intention to be recognized
can be decreased. The approach is evaluated in a collaborative human-robot
interaction task with a 7-DoF robot arm. The results show that fused
classifiers which combine multiple modalities outperform the respective
individual base classifiers with respect to increased accuracy, robustness, and
reduced uncertainty.Comment: Submitted to IROS 201
Probabilistic inverse optimal control with local linearization for non-linear partially observable systems
Inverse optimal control methods can be used to characterize behavior in
sequential decision-making tasks. Most existing work, however, requires the
control signals to be known, or is limited to fully-observable or linear
systems. This paper introduces a probabilistic approach to inverse optimal
control for stochastic non-linear systems with missing control signals and
partial observability that unifies existing approaches. By using an explicit
model of the noise characteristics of the sensory and control systems of the
agent in conjunction with local linearization techniques, we derive an
approximate likelihood for the model parameters, which can be computed within a
single forward pass. We evaluate our proposed method on stochastic and
partially observable version of classic control tasks, a navigation task, and a
manual reaching task. The proposed method has broad applicability, ranging from
imitation learning to sensorimotor neuroscience
Looking for Image Statistics: Active Vision With Avatars in a Naturalistic Virtual Environment
The efficient coding hypothesis posits that sensory systems are tuned to the regularities
of their natural input. The statistics of natural image databases have been the topic
of many studies, which have revealed biases in the distribution of orientations that are
related to neural representations as well as behavior in psychophysical tasks. However,
commonly used natural image databases contain images taken with a camera with a
planar image sensor and limited field of view. Thus, these images do not incorporate
the physical properties of the visual system and its active use reflecting body and eye
movements. Here, we investigate quantitatively, whether the active use of the visual
system influences image statistics across the visual field by simulating visual behaviors
in an avatar in a naturalistic virtual environment. Images with a field of view of 120â—¦ were
generated during exploration of a virtual forest environment both for a human and cat
avatar. The physical properties of the visual system were taken into account by projecting
the images onto idealized retinas according to models of the eyes’ geometrical optics.
Crucially, different active gaze behaviors were simulated to obtain image ensembles
that allow investigating the consequences of active visual behaviors on the statistics
of the input to the visual system. In the central visual field, the statistics of the virtual
images matched photographic images regarding their power spectra and a bias in
edge orientations toward cardinal directions. At larger eccentricities, the cardinal bias
was superimposed with a gradually increasing radial bias. The strength of this effect
depends on the active visual behavior and the physical properties of the eye. There
were also significant differences between the upper and lower visual field, which became
stronger depending on how the environment was actively sampled. Taken together, the
results show that quantitatively relating natural image statistics to neural representations
and psychophysical behavior requires not only to take the structure of the environment
into account, but also the physical properties of the visual system, and its active use
in behavior
- …